7  Working with Character Values

7.1 Reference

Default database in this chapter is “ex”.

Other databases included in the simulation MySQL server:

  • om
  • ap

This document is based on MySQL 8.4.

MySQL references for string manipulation: https://dev.mysql.com/doc/refman/8.4/en/string-functions.html

7.2 Character Data Type

7.3 Regular Expression

A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text.

There is no universal guide for regex, as each implementation has their own dialoect/flavour. Here is a one that help you to get started. MySQL Regex Syntax

7.4 String Extraction

SUBSTRING_INDEX

SUBSTRING_INDEX(): Return a substring from a string before the specified number of occurrences of the delimiter.

SELECT emp_name,
    SUBSTRING_INDEX(emp_name, ' ', 1) AS first_name,
    SUBSTRING_INDEX(emp_name, ' ', -1) AS last_name
FROM string_sample
5 records
emp_name first_name last_name
Lizbeth Darien Lizbeth Darien
Darnell O’Sullivan Darnell O’Sullivan
Lance Pinos-Potter Lance Pinos-Potter
Jean Paul Renard Jean Renard
Alisha von Strump Alisha Strump

SUBSTRING

SUBSTRING(): Return the substring between 2 position indexes.

Noitce the following example and the actions:

  • first number is the index position
    • if positive, counting from the left end
    • if negative, counting from the right end
  • second number says how many characters counting down should be returned.
SELECT emp_name,
    SUBSTRING(emp_name, 2, 2),
    SUBSTRING(emp_name, -4, 3)
FROM string_sample
5 records
emp_name SUBSTRING(emp_name, 2, 2) SUBSTRING(emp_name, -4, 3)
Lizbeth Darien iz rie
Darnell O’Sullivan ar iva
Lance Pinos-Potter an tte
Jean Paul Renard ea nar
Alisha von Strump li rum

REGEXP_SUBSTR

REGEXP_SUBSTR(): It is the same as SUBSTRING, but use the regular expression to mathc instead of position index.

For more information about regular expression, see Regular Expression

SELECT emp_name,
  -- match any capital letter, starting at position 3 after the match, return the position 2 after the starting point 
  REGEXP_SUBSTR(emp_name, '[A-Z]', 3, 2)
FROM string_sample
5 records
emp_name REGEXP_SUBSTR(emp_name, ‘[A-Z]’, 3, 2)
Lizbeth Darien b
Darnell O’Sullivan n
Lance Pinos-Potter c
Jean Paul Renard n
Alisha von Strump s

7.5 String Mutation

LPAD/RPAD

Make sure all text strings are extended to the same length, either:

  • LPAD(): add characters on the left hand.
  • RPAD(): add characters on the right hand.

Example: the following adds leading 0s to emplid Before any update to emp_id:

SELECT emp_id, emp_name
FROM string_sample
ORDER BY emp_id
5 records
emp_id emp_name
1 Lizbeth Darien
17 Lance Pinos-Potter
2 Darnell O’Sullivan
20 Jean Paul Renard
3 Alisha von Strump

After padding 0s added to emp_id:

SELECT LPAD(emp_id, 2, '0') AS emp_id, emp_name
FROM string_sample
ORDER BY emp_id
5 records
emp_id emp_name
01 Lizbeth Darien
02 Darnell O’Sullivan
03 Alisha von Strump
17 Lance Pinos-Potter
20 Jean Paul Renard

If adding 0s to the right side

SELECT RPAD(emp_id, 3, '0') AS emp_id, emp_name
FROM string_sample
ORDER BY emp_id
5 records
emp_id emp_name
100 Lizbeth Darien
170 Lance Pinos-Potter
200 Darnell O’Sullivan
200 Jean Paul Renard
300 Alisha von Strump

LOWER/UPPER

LOWER(): Returns the string str with all characters changed to lowercase.

UPPER(): Returns the string str with all characters changed to uppercase

SELECT emp_name, 
  LOWER(emp_name) AS lowercase,
  UPPER(emp_name) AS uppercase
FROM string_sample
5 records
emp_name lowercase uppercase
Lizbeth Darien lizbeth darien LIZBETH DARIEN
Darnell O’Sullivan darnell o’sullivan DARNELL O’SULLIVAN
Lance Pinos-Potter lance pinos-potter LANCE PINOS-POTTER
Jean Paul Renard jean paul renard JEAN PAUL RENARD
Alisha von Strump alisha von strump ALISHA VON STRUMP

REPLACE

REPLACE(): Replace occurrences of a specified string.

SELECT emp_name,
  REPLACE(emp_name, ' ', ' <=> ') AS new_name
FROM string_sample
5 records
emp_name new_name
Lizbeth Darien Lizbeth <=> Darien
Darnell O’Sullivan Darnell <=> O’Sullivan
Lance Pinos-Potter Lance <=> Pinos-Potter
Jean Paul Renard Jean <=> Paul <=> Renard
Alisha von Strump Alisha <=> von <=> Strump

REGEXP_REPLACE

REGEXP_REPLACE(): Similar as replace, but use regex to match and identify the part to replace.

Here is a wild example: we are replacing the first match of any letter with that symbol ’ <=> ’.

SELECT emp_name,
  REGEXP_REPLACE(emp_name, '^\w', ' <=> ') AS new_name
FROM string_sample
5 records
emp_name new_name
Lizbeth Darien Lizbeth Darien
Darnell O’Sullivan Darnell O’Sullivan
Lance Pinos-Potter Lance Pinos-Potter
Jean Paul Renard Jean Paul Renard
Alisha von Strump Alisha von Strump

CONCAT

CONCAT(): Glue all argument strings together as 1 string.

CONCAT_WS: Glue all argument strings With Separators.

SELECT emp_name,
    SUBSTRING_INDEX(emp_name, ' ', 1) AS first_name,
    SUBSTRING_INDEX(emp_name, ' ', -1) AS last_name,
    CONCAT_WS(', ',
      SUBSTRING_INDEX(emp_name, ' ', -1),
      SUBSTRING_INDEX(emp_name, ' ', 1)) AS new_name,
    SUBSTRING(SUBSTRING_INDEX(emp_name, ' ', 1), 1, 1) AS initial_first_name,
    SUBSTRING(SUBSTRING_INDEX(emp_name, ' ', -1), 1, 1) AS initial_last_name,
    CONCAT(
      SUBSTRING(SUBSTRING_INDEX(emp_name, ' ', 1), 1, 1), 
      SUBSTRING(SUBSTRING_INDEX(emp_name, ' ', -1), 1, 1)) AS initial
FROM string_sample
5 records
emp_name first_name last_name new_name initial_first_name initial_last_name initial
Lizbeth Darien Lizbeth Darien Darien, Lizbeth L D LD
Darnell O’Sullivan Darnell O’Sullivan O’Sullivan, Darnell D O DO
Lance Pinos-Potter Lance Pinos-Potter Pinos-Potter, Lance L P LP
Jean Paul Renard Jean Renard Renard, Jean J R JR
Alisha von Strump Alisha Strump Strump, Alisha A S AS